skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Vinayak, Ramya"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
  2. This paper investigates simultaneous preference and metric learning from a crowd of respondents. A set of items represented by d -dimensional feature vectors and paired comparisons of the form item i is preferable to item j '' made by each user is given. Our model jointly learns a distance metric that characterizes the crowd's general measure of item similarities along with a latent ideal point for each user reflecting their individual preferences. This model has the flexibility to capture individual preferences, while enjoying a metric learning sample cost that is amortized over the crowd. We first study this problem in a noiseless, continuous response setting (i.e., responses equal to differences of item distances) to understand the fundamental limits of learning. Next, we establish prediction error guarantees for noisy, binary measurements such as may be collected from human respondents, and show how the sample complexity improves when the underlying metric is low-rank. Finally, we establish recovery guarantees under assumptions on the response distribution. We demonstrate the performance of our model on both simulated data and on a dataset of color preference judgements across a large number of users. 
    more » « less
  3. We study the problem of estimating the distribution of effect sizes (the mean of the test statistic under the alternate hypothesis) in a multiple testing setting. Knowing this distribution allows us to calculate the power (type II error) of any experimental design. We show that it is possible to estimate this distribution using an inexpensive pilot experiment, which takes significantly fewer samples than would be required by an experiment that identified the discoveries. Our estimator can be used to guarantee the number of discoveries that will be made using a given experimental design in a future experiment. We prove that this simple and computationally efficient estimator enjoys a number of favorable theoretical properties, and demonstrate its effectiveness on data from a gene knockout experiment on influenza inhibition in Drosophila. 
    more » « less
  4. We study the problem of estimating the distribution of effect sizes (the mean of the test statistic under the alternate hypothesis) in a multiple testing setting. Knowing this distribution allows us to calculate the power (type II error) of any experimental design. We show that it is possible to estimate this distribution using an inexpensive pilot experiment, which takes significantly fewer samples than would be required by an experiment that identified the discoveries. Our estimator can be used to guarantee the number of discoveries that will be made using a given experimental design in a future experiment. We prove that this simple and computationally efficient estimator enjoys a number of favorable theoretical properties, and demonstrate its effectiveness on data from a gene knockout experiment on influenza inhibition in Drosophila. 
    more » « less
  5. Consider a setting with N independent individuals, each with an unknown parameter, p_i in [0,1] drawn from some unknown distribution P*. After observing the outcomes of t independent Bernoulli trials, i.e., Xi ~ Binomial(t, p_i) per individual, our objective is to accurately estimate P*. This problem arises in numerous domains, including the social sciences, psychology, healthcare, and biology, where the size of the population under study is usually large while the number of observations per individual is often limited. Our main result shows that, in the regime where t << N , the maximum likelihood estimator (MLE) is both statistically minimax optimal and efficiently computable. Precisely, for sufficiently large N , the MLE achieves the information theoretic optimal error bound of O(1/sqrt(t log N)) for N< exp(t), and O(1/t) for N> exp(t), with regards to the L1 distance between the true cdf and the estimated cdf. 
    more » « less
  6. Consider a setting with N independent individuals, each with an unknown parameter, p_i in [0,1] drawn from some unknown distribution P*. After observing the outcomes of t independent Bernoulli trials, i.e., Xi ~ Binomial(t, p_i) per individual, our objective is to accurately estimate P*. This problem arises in numerous domains, including the social sciences, psychology, healthcare, and biology, where the size of the population under study is usually large while the number of observations per individual is often limited. Our main result shows that, in the regime where t << N , the maximum likelihood estimator (MLE) is both statisticallyminimax optimal and efficiently computable. Precisely, for sufficiently large N , the MLE achieves the information theoretic optimal error bound of O(1/sqrt(t log N)) for N< exp(t), and O(1/t) for N> exp(t), with regards to the L1 distance between the true cdf and the estimated cdf. 
    more » « less
  7. Societal-scale data is playing an increasingly prominent role in social science research; examples from research on geopolitical events include questions on how emergency events impact the diffusion of information or how new policies change patterns of social interaction. Such research often draws critical inferences from observing how an exogenous event changes meaningful metrics like network degree or network entropy. However, as we show in this work, standard estimation methodologies make systematically incorrect inferences when the event also changes the sparsity of the data. To address this issue, we provide a general framework for inferring changes in social metrics when dealing with non-stationary sparsity. We propose a plug-in correction that can be applied to any estimator, including several recently proposed procedures. Using both simulated and real data, we demonstrate that the correction significantly improves the accuracy of the estimated change under a variety of plausible data generating processes. In particular, using a large dataset of calls from Afghanistan, we show that whereas traditional methods substantially overestimate the impact of a violent event on social diversity, the plug-in correction reveals the true response to be much more modest. 
    more » « less